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Abstract 

We propose a unified framework for deriving and studying soft-in-soft-out (SISO) detection 
. in interference channels using the concept of variational inference. The proposed framework may 

be used in multiple-access interference (MAI), inter-symbol interference (ISI), and multiple- input 
, multiple-output (MIMO) channels. Without loss of generality, we will focus our attention on 

turbo multiuser detection, to facilitate a more concrete discussion. It is shown that, with some 
£J , loss of optimality, variational inference avoids the exponential complexity of a posteriori prob- 

ability (APP) detection by optimizing a closely-related, but much more manageable, objective 
' ' | function called variational free energy. In addition to its systematic appeal, there are several 

£N| ' other advantages to this viewpoint. First of all, it provides unified and rigorous justifications 

' for numerous detectors that were proposed on radically different grounds, and facilitates conve- 

nient joint detection and decoding (utilizing the turbo principle) when error-control codes are 
0^ ' incorporated. Secondly, efficient joint parameter estimation and data detection is possible via 

qq ' the variational expectation maximization (EM) algorithm, such that the detrimental effect of 

inaccurate channel knowledge at the receiver may be dealt with systematically. We are also 

■ able to extend BPSK-based SISO detection schemes to arbitrary square QAM constellations in 

■ a rigorous manner using a variational argument. 

1 Introduction 

Following the discovery of turbo codes [1] , the principle of turbo processing has been used in various 
signal processing settings. Among these, turbo detection for coded transmission in interference 
channels, which treats the error control code as the outer code and the interference channel as 
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the inner code, has been shown to perform dramatically better than the conventional non-iterative 
method of interference suppression followed by hard-decision decoding. Depending on the channel 
of interest, turbo detection includes turbo multiuser detection for multiple access channels [21 [3], 
turbo equalization for inter-symbol interference (ISI) channels [U(5], and turbo MIMO equalization 
for multiple-input multiple-output (MIMO) channels [H [7]. Due to the linear Gaussian vector 
channel model that is common to these problems, techniques developed in one area can often be 
readily applied to another with only minor modifications. In this paper, we will restrict our signal 
model to the multiuser detection (MUD) scenario. It should be understood that the solutions 
proposed for this particular problem may be generalized to turbo equalization and turbo MIMO 
detection settings as well. 

The evolution of MUD research has seen detectors being derived through many different ap- 
proaches, such as the minimization of mean-squared error (MMSE), decision- feedback, or multi- 
stage interference cancellation (IC) [8]. Within the past decade, there has been a growing interest 
in coded CDMA systems, where the need for joint detection and decoding leads to a different class 
of multiuser detectors, namely turbo multiuser detectors. Practical turbo multiuser detectors pro- 
posed in [2] and [3] are among the earliest and most celebrated ones, due to their simplicity and 
remarkable performance. 

Inside a turbo multiuser detector, a soft-in-soft-out (SISO) detector component is of crucial 
importance, and is where the main design challenges lie. It differs from the conventional detectors in 
that it must be able to make use of prior knowledge of the symbols to be detected, and the structure 
of the multiple access channel, to generate soft symbol decisions. Unfortunately, unlike the decoder 
component, for which feasible, low-complexity a posteriori probability (APP) generators (e.g., the 
BCJR algorithm [9] for convolutional codes) may be assumed, the optimal APP multiuser detector 
has exponential complexity and is infeasible. As a result, suboptimal SISO MUD design is key to 
the success of a practical turbo multiuser detector. 

In this paper, we intend to propose a generalized method for the design of a SISO MUD, 
adopting a technique called variational inference pp. 422-436], which, like the sum-product 
algorithm [11], is an approximate inference algorithm in probabilistic models. We will see that this 
approach not only successfully includes some important existing SISO MUD schemes as special 
cases, but easily leads to various improvements and extensions. Although our study focuses on 
SISO MUD by treating it as an approximate inference engine, it also encompasses uncoded MUD 
(detectors with no prior information and only hard decision output), since uncoded MUD can be 
viewed as SISO MUD with uniform prior distributions for the channel symbols. 

Prior to this paper, recent attempts on providing a unified approach to study the wide range 
of multiuser detectors include, to name a few, [12], [13] and p3]. Boutros and Caire [12] general- 
ize iterative multiuser joint decoding as an approximate sum-product algorithm in a factor graph 
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containing both the multiuser channel and code constraints. Such a generalization leads to elegant 
performance analysis through density evolution. Tanaka [13] and Guo and Verdu [2] view the un- 
coded linear and optimal multiuser detectors as posterior mean estimators of the Bayes retrochannel 
such that, in the large system limit, the bit error rate (BER) may be evaluated through techniques 
from statistical physics. This paper may be regarded as an extension of [13] and [13] into the realm 
of nonlinear (and iterative) detectors. Specifically, we show that such detectors arise from approx- 
imating the posterior distributions and iteratively optimizing the approximate distributions, and 
address the design challenges of the MUD component within the iterative multiuser joint decoding 
problem, highlighted in [12]. 

The implications of this new generalized framework are significant in at least three ways: 

1. Theoretical Justification for Existing Multiuser Detectors: Section [J] introduces the variational 
inference formulation for MUD, in which a quantity known as variational free energy is con- 
structed and minimized, generating a procedure termed variational free energy minimization 
(VFEM). From this perspective, we will show how various uncoded linear multiuser detectors 
(e.g., decorrelating and MMSE detectors), as well as their interference cancellation extensions 
(e.g., unconstrained or clipped successive interference cancellation (SIC) detectors) may be 
derived. We will further argue that the VFEM approach naturally produces SISO multiuser 
detectors that can be used in turbo MUD. In particular, we will examine the celebrated al- 
gorithms proposed in [2] and [3], to reveal that they can both be derived with the VFEM 
approach. 

2. Channel Parameter Joint Estimation Using Variational EM Algorithm: Section [5] considers 
the scenario where certain channel parameters are unknown or inaccurately estimated at 
the multiuser receiver, motivating the joint estimation of channel parameters together with 
unknown data symbols. The VFEM framework offers a natural solution to this problem. By 
iteratively minimizing the free energy over both the data symbols and the channel parameters, 
we arrive at the variational EM algorithm [15]. This is a generalized EM algorithm with 
exact inference in the E step replaced by variational inference. As examples of this parameter 
estimation mechanism, we will demonstrate how the unknown channel noise variance may be 
iteratively estimated, and inaccurate channel amplitude refined, in conjunction with turbo 
MUD. 

3. Generalization of BPSK MUD to Square QAM Modulation: In bandwidth-constrained chan- 
nels, extensions of the SISO multiuser detectors from BPSK modulation to square QAM 
modulation may also be carried out within the VFEM framework. These extensions are not 
ad hoc, but optimal in the sense that the variational free energy modified for Af-QAM mod- 
ulation is minimized. Such a scheme gives rise to an iterative detection technique for general 
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linear Gaussian channels, called Bit-Level Equalization and Soft Detection (BLESD). It was 
introduced in separate works of ours [16j [T7] . 

The rest of the paper will be organized as follows: Section [2] describes the multiple access 
channel model and formulates the optimal SISO multiuser detectors; Section [3] discusses the decod- 
ing/detection scheduling issue by studying the factor graph containing both the multiuser channel 
and code constraints. This will prove to be an important design parameter in the subsequent 
analysis of variational-inference-based detectors. Sections H] and [5] contain the introduction and 
application examples of the proposed variational inference framework for MUD, and in two direc- 
tions (the first two points summarized above) justify the merits of this new point of view; Section 
E] presents some simulation results, and Section [7] concludes the paper. 

Notation: Upper and lower case bold face letters indicate matrices and column vectors, respec- 
tively; 1 represents the all-one column vector; XoY stands for the Schur product (element-wise 
product) of matrices X and Y; tr(X) denotes the trace of a square matrix X; diag(x) is a diagonal 
matrix with the vector x on its diagonal; diag(X) is a diagonal matrix with the diagonal elements 
of square matrix X on its diagonal; E(-) and V(-) stand for the expected value and variance of a 
random variable; N(fJ., 5]) represents a Gaussian pdf with mean fi and covariance matrix X. 

2 System Description 

2.1 Signal Model for BPSK Modulation 

Consider a synchronous DS-CDMA wireless link with K users. Assuming flat fading, by sampling 
the chip matched filter output at chip rate, the received signal in one symbol interval, r 6 M Arxl , 
can be written in the well-known vector form: 

r = SAb + n, (1) 

where S = [si,S2,-- - , sk\ is the spreading code matrix containing the normalized spreading se- 
quences of the K active users, A = diag(Ai, A2, ■ ■ ■ ,Ak) is the channel matrix representing each 
user's signal amplitude and b = &2> ' ' ' > ^k] T contains the transmitted BPSK channel symbols 
from each user, n is a white Gaussian noise vector with distribution p(n) = A/"(0, <r 2 I). 

After bit-level matched filtering at the receiver, we may write the matched filter output, y € 
R Kxl , as: 

y = S T r = RAb + z, (2) 

where R = S T S is the symmetric normalized signature correlation matrix with unit diagonal 
elements, and z is a coloured Gaussian noise vector with distribution p(z) = jV(0, cx 2 R). 
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The correlated noise statistics in y may be whitened by applying a noise whitening filter F T , 
yielding 

y = F~ T y = FAb + n, (3) 

where F is a lower triangular matrix (i.e., Fij = for i < j) resulting from the Cholesky factorization 
for R, R = F T F. ii is a white Gaussian noise vector, having the same distribution as n. 

As y and y are sufficient statistics for detecting b, equations ([T]), ^ and © are equivalent 
starting points for the derivation of multiuser detectors, although certain computational savings 
are easier to identify with certain models. 

Note that the channel model for frequency selective and asynchronous channels takes a similar 
linear form as JT]). Thus the adaptation to these more general channel types is possible, but will 
not be discussed explicitly here. Interested readers may refer to, e.g., [2], for further insights. 

2.2 Optimal SISO Detectors 

Given the prior distribution p(h) and the conditional distribution p(r|b), the jointly optimal (JO) 
detector uses Bayes rule to compute 

p(r|b)p(b) 

P{ ' ) £ b p(r|b)p(b) 1 > 

The posterior distribution p(b|r) is the "soft output" of the jointly optimal detector; hard decisions 
are obtained by maximizing over all possible symbol vectors b. 

Similarly, the individually optimal (10) detector is obtained by evaluating the marginal posterior 
distribution of (k = 1 to K): 

, b , r) = P(r\b k )p(b k ) 

Y.b k p( r \ b k)p{bkY 

where p{r\bk)p{bk) = X^b\b fe p(r|b)p(b). Due to the discrete nature of the information symbols, 
both jointly optimal and individually optimal detectors require prohibitive exponential complexity. 

The individually optimal detector is the optimal SISO multiuser detector in terms of minimizing 
bit error rate (BER). Practical suboptimal SISO multiuser detectors may be derived by taking in 
the prior information p(bk) and producing a posterior probability p(bk\r) or p{bk\y) through some 
intelligent approximation which does not have exponential complexity. Variational inference is 
one example of these "intelligent approximations", where the outcome, Q(bk), which approximates 
p(6fc|r), is found by optimizing an underlying cost function called variational free energy, as will be 
shown in Section UJ 
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3 Message-Passing Scheduling in Turbo Multiuser Detection 



In a turbo multiuser detector, the detector section needs to be able to accept prior estimates 
{p{bk)}k=i from the APP decoder and generate a soft decision, called extrinsic information (EXT), 
to be sent back to the APP decoder. Such a mechanism for EXT exchange can be rigorously justified 
as the message passing algorithm in graphs [181 112]. However, since any practical multiuser detector 
is at best an approximation to the exact sum-product algorithm (because exact inference, with the 
individually optimal detector, is NP complete), good methods to generate and pass EXT are not 
unique. 

In addition, the factor graph describing the statistical dependencies among all unknowns (con- 
ditioned on the observations) contains cycles, and hence several message passing schedules are 
valid. In this section we describe the sequential, flooding and hybrid schedules, and show that 
the Wang-Poor algorithm corresponds to a hybrid scheduling. The sequential schedule takes K 
times as long as the flooding schedule, but may result in fewer iterations to achieve a given level 
of performance. While message-passing scheduling has not been thoroughly studied in the turbo 
MUD context, it is an important topic in iterative decoding of error control codes. For example, 
the different convergence rate of sequential and parallel (flooding) scheduling for decoding LDPC 
codes has been reported in [19]. 




SISO 
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Bits 
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Decoder 




Information 
Bits 



Figure 1: Graphical model of a coded multiuser channel. Note the time dependency among bits of 
the same user (code constraint), and the user dependency among bits at the same time (channel 
constraint). 

From Fig. [U it is seen that the nodes representing the channel bits {&t,fc}iLi are the relay nodes 
that separate the graph into two halves, where on one side the decoder runs belief propagation to 
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perform per-user APP decoding and on the other side the multiuser detector performs variational 
inference. The process by which the APP decoder retrieves prior information and generates extrinsic 
information is standard (see [9]) and will be skipped. We will therefore only discuss message passing 
between the detector and decoder. 



3.1 Obtaining Extrinsic Information: Sequential Schedule 
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Figure 2: An instance of sequential message-passing in the graphical model: the multiuser detector 
receives prior distributions of 62 , 63 and 64 to generate the extrinsic information for b\ . This process 
is repeated for 62 j ^3 and 64 to complete one message-passing iteration. 



When a SISO detector is viewed as an approximate sum-product algorithm [12], the EXT may 
be obtained in a way analogous to the message-passing rule in graphs. Fig. [2] provides an example 
that demonstrates that the EXT for b\ may be generated using the priors of 62 , &3 and 64, but not 
the prior of b\. In its exact form the message (EXT) from node / to node b\ is 



M f-*i= E P( r \ h )p( b 2)p(h)p{h) =p{r\h). 

62,^3,64 



(6) 



In sequential scheduling, M. wm be passed into the APP decoder for user 1, which will generate 
a new prior for b\ that will be used for EXT generation for 62, and so on. So error control decoding 
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is performed one user at a time, and not in parallel. 

In an approximate evaluation of EXT for bk that follows the same vein, one would ignore the 
prior of bk even if it is available from a previous iteration, and use a simple multi-user detector 
such as linear MMSE to generate an estimated p(r\bk) using only {p(bi)}i^k- Thus in the sequential 
schedule, 

• the EXT for each bit is obtained using different inputs (prior distributions), necessitating a 
substantially different EXT generator (multiuser detector) for each bit; and 

• the prior knowledge of bk is ignored before detection in generating the EXT for bk- 

The sequential schedule to obtain extrinsic information is intuitive, since it resembles the 
message-passing protocol defined in the sum-product algorithm [20\ ch. 4]. But it is also very 
restrictive, in that users have to be detected in series, introducing latency in the detection process. 
Furthermore, since a different joint detector must be devised for each user, the overall complexity 
in general increases linearly with K if no simplification measures are taken. 

3.2 Obtaining Extrinsic Information: Flooding Schedule 

In the flooding schedule, illustrated in Fig. [31 EXT's for all bits are generated in parallel. The 
message from node / to bk will be 

M f ^ h = £ P(rjb) J] POn) - ^ry ■ 00 

Note that, unlike in sequential scheduling, all EXT's use the same priors. For instance, A4 f~>b 2 
and Mf^bi both use p{b^) whereas in the sequential schedule, .M/_>{, 4 would use p new {bs) from the 
most recent round of APP decoding. 

As well, we can write the EXT of bk as 

Mf ^ = ^T) ^ P(r|b)n*fo) (8) 

PK k) {hh^k 1=1 

and hence view the flooding schedule as making use of all prior probabilities from the same iteration. 
This reasoning, together with (|7j), leads to the following sub-optimal approximation: 

• Use all prior probabilities from the same iteration to generate an approximate p(&&|r), say 

Q(b k ); 

• Form the EXT for bk by dividing Q(bk) by p(bk); 

• Send all EXT's to the K APP decoders in parallel. 
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Figure 3: An instance of flooding message-passing in the graphical model: the multiuser detector 
receives prior distributions of b\, ■ ■ ■ , 64 to generate the extrinsic information for 61, • • ■ ,64. This 
completes one message-passing iteration. 



The advantage of the flooding schedule is two fold: 1) By passing messages to the detector in 
one shot, the latency is low; 2) By generating the extrinsic information in one shot, the complexity 
of the detector is reduced. 

Through implementing the flooding schedule, our MUD design challenge is shifted from approx- 
imating p(r\bk) to approximating p(bk\r). And the variational inference viewpoint of MUD allows 
us to easily do so. 



3.3 Obtaining Extrinsic Information: Hybrid Schedule 

A hybrid schedule can be defined in which the EXT for b^ is computed without using p(bk) like in 
the sequential schedule, and all EXT's are computed in parallel like in the flooding schedule. This 
approach removes the latency issue in sequential scheduling, and has been used in the literature 
without justification. 

If exact inference is used to compute p(r|6j.) in the hybrid schedule, and p(&&|r) in the flooding 
schedule, the two implementations are identical, since the messages coming out of the MUD section 
are the same - {p{r\bk)}^ =1 . However, in practical detector design, p(r\bk) or p(bk\r) must be 



9 



approximated. As to be demonstrated in Section EOl p(bk\r) may be approximated as Q(bk) given 
prior distributions {p{h)}f =l , while p(r\bk) may be approximated as Q(bk) given {p(&z)}z^fc and non- 
informative p(bk). With these approximations, the hybrid and flooding scheduling schemes differ, 
as the former becomes the Wang-Poor turbo detector [3] and the latter turns into a brand-new 
design. 

4 Multiuser Detection via Variational Inference 

In [13], Guo and Verdu treat the linear multiuser detectors as posterior mean estimators (PME) 
with appropriately postulated distributions p(h) and p(r|b). For example, if a Gaussian prior is 
assumed, i.e., p(b) = A/*(0,I), and the channel is modelled as p(r|b) = A/*(SAb, a 2 I), the posterior 
(or conditional) mean estimator, i.e., E [b|r], is a generalized linear detector given by 



By choosing different values for a, we arrive at different linear detectors. If a 2 = a 2 , we get the 
MMSE detector. If a — > 0, we approach the decorrelating detector. And if a — > oo, the matched 
filter output is attained. 

However, p3] has not considered another important class of detectors, namely the nonlinear 
detectors. In this work, we wish to extend the coverage of the posterior mean estimator to nonlinear 
detectors by introducing an additional degree of freedom in approximating the posterior distribu- 
tion. More specifically, we will not limit ourselves to applying Bayes rule to calculate the posterior, 
but instead use the more general and flexible variational inference technique. 

4.1 Variational Inference and Variational Free Energy Minimization 

We shall explain the variational inference method specifically in terms of its application to multiuser 

detection, while a more general and in-depth treatment, as well as its alternative interpretations 

and connection to statistical physics, can be found in |21j . [22], and |10| . 

As stated earlier, the general task of the SISO multiuser detector is to perform inference on b 

given the observation r, y or y (we will simply use r for now, as it is understood that they are 

equivalent). Suppose our objective is the jointly optimal detector, then the distribution of interest 

iqjp(b|r). Very often, however, the direct evaluation of p(b|r) is computationally intractable when 

Bayes rule is applied directly, in particular, when p(h) is a discrete distribution. In such a case, 

the variational inference technique assumes a tractable approximation to p(b|r), written as Q(h), 

where the constant r is omitted for convenience. 

1 Strictly speaking, individually optimal detector minimizes the BER. But since the difference is minimal, we may 
consider the jointly optimal detector for simplicity 




(9) 
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A good approximation Q(h) needs to resemble p(b|r) as closely as possible, and the Kullback- 
Leibler divergence (or relative entropy) D [Q(b)||p(b|r)] offers an excellent measure of similarity. But 
since the distribution p(b|r) is difficult to attain as we have assumed, an equivalent alternative, 
p(h, r) =p(r|b)p(b), is used, and p(h, r) is called the complete likelihood function. The variational 
free energy is thus defined as: 



which equals D [Q(h)\\p(h\r)] up to an additive constant. In (|1U|) . Q(h) is written as Q(h;\) to 
denote the dependence of Q(b) on A explicitly, where A contains a set of parameters that specify 
Q(h). In the rest of the paper, we will however drop the dependence of the Q function on A in 
accordance with the usual convention for writing probability distributions. 

If no constraints are placed on Q(b), by minimizing ^"(A), we reach Q(b) = p(b|r) and nothing 
is gained. But if we parameterize Q(h) by assuming that it comes from a restricted family of 
distributions (for example, a Gaussian), we may very easily find a closed-form expression for ^"(A), 
which leads to a good approximation of p(b|r) via the minimization of variational free energy. This 
method of performing approximate inference is called variational inference. 

One important technique often used in variational inference is the assumption that Q(h) is 
factorizable as n^Li Qk{°k) ( we shall omit the subscripts in from here on), and find distributions 
{Q(bk)}k=i that minimize the free energy. This factorization of a distribution and the independence 
assumption associated with it is referred to as the mean-field approximation in statistical physics. 
A demonstration of its application will be presented in detail in Section 14.51 

The following is an outline of the general procedure for deriving multiuser detectors through 



1. Postulation: Assume postulated distributions for p(h), p(r|b) and Q(b); 

2. Evaluation: Derive closed- form expression for ^"(A); 

3. Optimization: Minimize f-(X) (exactly or iteratively) over A. 

Note that we have now transformed the general MUD problem into a well-defined optimization 
problem, with a unique objective function, called variational free energy. This procedure bears close 
resemblance to the routine of deriving thermodynamic state equations in statistical physics [23], 
which is not surprising, considering the fact that variational inference is indeed rooted in statistical 
physics. 




(10) 



VFEM: 
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4.2 VFEM Interpretation of Linear Multiuser Detectors 

We shall begin by deriving linear multiuser detectors from variational free energy minimization, 
and thus show that simply adjusting the postulated distributions p(b), p(r|b) and Q(b) leads to the 
well-known decorrelating and MMSE detectors. Although the exercises presented here are some- 
what trivial, since uncoded linear MUD is the simplest instance of MUD, they lay the foundation 
for more sophisticated variations in later sections. 

Proposition 1 Decorrelating Detectors may be derived through the VFEM routine by assuming 
the following distributions: 

p(h) = Constant 
p( r \b) = AA(SAb,cr 2 I) (11) 
Q(b) = jV(/i,£). 

Proof: Evaluating .F(A) as in (fTOj) . we have a function of \i and X: 

T(fi, £) = -1 log |S| + ^_ {/i T A T S T SA/i + tr[(A T S T SA)X] - 2r r SA A t} (12) 

The final estimate of Q(b) is given by the minimizers p, and X of J-(fJ., £). Calculating dJ-(fj,)/dfx 
and dJ-{Ti) / dTi~ l and equating to zero, we have 

A = (A T S T SA)^ 1 A r S r r 

(13) 

£ = a 2 (A T S T SA)-\ V 7 

If hard decisions are desired, fi can be used as the detector output, since it maximizes Q(b), which 
is Gaussian. It is easy to recognize that fi is identical to the decorrelating detector output. ■ 

Note that given the postulated priors in (jllj) . the exact posterior p(b|r) is tractable and is in 
fact Gaussian. Therefore, the solved Q function, Af(fi, X), is the exact posterior distribution which 
could also have been found by applying Bayes rule directly. 

The decorrelating detector uses non-informative priors for the data bits transmitted, by setting 
p(b) to a constant. But in practice, side information is available. For instance, {&fc}^ =1 can be 
safely assumed to be i.i.d. and zero mean. For BPSK signaling, in particular, we also known that 
E(6?) = 1. We will subsequently show that the Gaussian approximation about p(b), utilizing the 
first and second order statistics of b, gives rise to the familiar MMSE detector. 

Proposition 2 MMSE Multiuser Detectors may be derived through the VFEM routine by assuming 
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the following distributions: 



< 



P (h) 

p(r|b) 
0(b) 



AT(0,I) 

Af(SAb,cr 2 I) 

AT(^E). 



(14) 



Proof: Evaluating .F(A) yields a function of /x and S: 




Solving dr{n)/dn = and 8J r (E)/d'E 



-l 



leads to the following solution: 



fi = (A T S T SA + ^I)" 1 A T S T r 
± = ( 7 2 (A T S T SA + a 2 I)- 1 . 



(16) 



Apparently, fi in (|16j) can be identified as the MMSE detector output. 



Note that the variational inference interpretation of decorrelating and MMSE detectors also 
produces a covariance matrix S of the Q function, which is not available through conventional 
signal processing techniques. S indicates the reliability of the detector output, something the 
hard-decision detector is unable to make use of. But it will prove valuable in SISO detectors, as 
demonstrated in Sections 14.41 and 14.51 

4.3 VFEM Interpretation of Interference Cancellation Detectors 

Iterative multiuser detectors, and especially their convergence behavior, have been actively re- 
searched in the past. In [23], linear SIC and PIC are categorized as the Gauss-Seidel and Jacobi 
iterations for solving linear equations. SIC is also analyzed in greater depth in [25] and [26]. The 
study is later extended to clipped SIC in [27] through the investigation of the variational inequality 
(VI) problem. Here we offer an alternative view of SIC as the coordinate descent algorithm applied 
to the minimization of J-(fJ., S). 

Proposition 3 Linear/Clipped SIC Detectors may be derived from assuming the same distribu- 
tions as in Ul\) or {1$ , except by minimizing J-(fi, XI) using the coordinate descent algorithm. 
That is, in the i-th iteration, for k = 1 to K, 



The algorithm describes a linear SIC if \i m in = — oo and fi ma x = oo, and a clipped SIC otherwise. 



min Alfe 



s.t. 



W ... ..W ... ... ,.(*-!) y\ 



(17) 
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Proof: Setting dJ r (fj., £)/<9/ifc = based on (fT5j) yields 

4s[SA^i - A k s^r + a 2 fi k = 0. (18) 

Rearranging the terms and denning fj,\ k = • • • , Mfe-i, 0, //fc+i, • ■ ■ , AiK] T , the optimal /xj; is then 
expressed in the familiar linear interference cancellation form if fj, k is unbounded (i.e., fj, m i n = — oo 
and fi max = oo): 

^ = A 2 + a 2 AkS ^ r ~ SA/X \ fe ^ ( 19 ) 

Since updating fj, k (k = 1, • ■ ■ , K) consecutively subject to <9jF(/x, S)/9/x^ = is the coordinate 
descent algorithm for minimizing JF(/x, E), then (|19p corresponds to the coordinate descent imple- 
mentation of the MMSE detector. On the other hand, setting <9jF(/x, Yi)/d^ k = based on (|12p 
leads to the the coordinate descent implementation of the decorrelating detector: 

A fe = -j-s£(r - SA^ fc ), (20) 

which is the standard-form SIC detector seen in the literature. If fi m i n and \x ma x are finite, we need 
to solve (fl~8|) subject to ii m i n < /ife < ^maxi which corresponds to clipped SIC. ■ 
To verify that (|19|) and (|20p do converge to MMSE or decorrelator solutions, and to gain further 
insights into the convergence behavior when the optimization constraints are active (clipped SIC), 
we invoke the following theorem |28j : 

Theorem 1 (Luo and Tseng, 1992) Consider an optimization problem: 

min /(x) = 5 (Ex) + c T x, s.t. x € X, (21) 

where X is a box (possibly unbounded) in M n , f is a proper closed convex function in M. n , g is a 
proper closed convex function in W 71 , E is an m x n matrix having no zero column, and c £ W 1 . 
Also assume 

1. The set of optimal solutions for h21\) . denoted by X* is nonempty; 

2. The domain of g is open and g is strictly convex twice continuously differ entiable on the 
domain; 

3. V 2 g(Ex*) is positive definite for all x* € X* . 

Then if {x r } is a sequence of iterates generated by coordinate descent method according to the 
Almost Cyclic Rule or Gauss-Southwell Rule, {x r } converges at least linearly to an element of X* . 

Since the objective function of optimization, ^"(/x, S), satisfies all conditions in the theorem 
when the spreading codes are linearly independent, it is clear that this theorem applies to the 
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general linear/clipped SIC setting. Also due to the objective function being quadratic and the 
constraints being linear, there is a unique optimal solution in X* . We may thus conclude the 
following: 

Corollary 1 Linear/Clipped SIC are guaranteed to converge to the unique minimum free energy 
defined by J- (ft, S) and the constraint ([i m i n < [ik < Umax), and the rate of convergence is at least 
linear. 

This result is proven for the first time to our knowledge. Additionally, we may relax the conven- 
tional cyclic order of iteration for SIC and assert that as long as the coordinates are iterated upon 
according to either the Almost Cyclic Rule or Gauss- Southwell Rule, at least linear convergence 
rate is guaranteed. These relaxed iteration rules are discussed in [28] . 

In the sequel, we will investigate a few SISO multiuser detectors within the variational inference 
framework. Unlike the uncoded detectors studied previously, we will now make use of the soft 
output provided by Q(b) to facilitate iterative multiuser joint decoding. We will demonstrate that 
a unique SISO detector is determined by choosing 1) the postulated distributions (like (fTT|) and 
CEH), but with biased priors), and 2) the message-passing schedule for joint decoding. 



4.4 VFEM Interpretation of Gaussian SISO Multiuser Detector 



Definition 1 A Gaussian SISO Multiuser Detector is a multiuser detector that obtains soft 


estimates Q(b) through the VFEM routine, subject to the following postulated distributions: 




p(b) = Af(b,W) 


< 


p(r|b) = A/"(SAb,a 2 I) (22) 




Q(b) = A^,£), 


where b = [b\, ■ ■ ■ ,bx] T are the soft bit estimates from the APP decoder, and W = diag([l — 


hi ~b\n 





We name this detector Gaussian SISO MUD because, like the uncoded linear detectors in 
Section 14.21 Gaussian densities are assumed for the prior and posterior distributions of b. But 
unlike the linear detectors, this detector is capable of accepting informative priors, as well as 
generating soft posterior bit probability. 



4.4.1 The Existing Form of Gaussian SISO MUD 

A ground-breaking turbo detection scheme was proposed by Wang and Poor [3] , spurring a tremen- 
dous amount of interest in turbo MUD and turbo equalization in the years that followed. It involves 
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a two stage process: First, the soft bit estimate from the APP decoder is remodulated and sub- 
tracted from the matched filter output: 



y fc ^y-RAb fc , (23) 

In (|23|) . bfe = [&i, • • • , 0, b k+ i, • • • , bx] T , which is equal to the soft bit estimates coming from 
the APP decoder, b, except for the k-th. element being 0. 

Second, a linear MMSE filter is used to further suppress the residual interference. It can be 
shown that the filter output is 

MMSE with residual MAI Soft IC 



z k = A k e T k [A T W fc A + a 2 ^ 1 ] 1 [R-V-Abfc], (24) 
where denotes a K- vector of all zeros, except for the k-th. element being 1, and = diag([l — 

In order to convert the MMSE filter output z k into a soft estimate in the discrete domain, a 
Gaussian equivalent channel assumption is made about z k , i.e., 

Zk = a k b k + rj k , (25) 

where a k is a constant and p(i] k ) = A/"(0,z/|). In other words, p(z k \b k ) = M(a k b k ,u^). Since a k 
and v\ can be found to be, respectively, 



a k = ^[(A^WfcA + ^R- 1 )- 1 ]^ 



(26) 



the output EXT can be written as 



ttt3 (u \ i Pi Y \b k = l) p(z k \b k = l) 2z k 

LLR w (6 fc ) = log — w log — — = . (27) 

P{r\b k = -1) p{z k \b k = -l) l-a k 

In essence, the target distribution p(r\b k ) is approximated by p(z k \b k ) to obtain the EXT. We 
will now demonstrate that with the VFEM formulation, the two-stage process can be derived from 
a single optimization procedure, and without the heuristic Gaussian assumption about z k . 

Proposition 4 The SISO multiuser detection scheme described in J3}/ is an instance of Gaussian 
SISO MUD. 

Proof: If the extrinsic information is extracted following the sequential schedule in Section 13. 1\ 
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by ignoring the prior information for b k , then (|22|) may be modified as 



P (b) = M(b k ,W k ) 
p(r\b) = AA(SAb,cr 2 I) (28) 
Q(b) = A^ fe ,E fe ), 



where b fc = [61, • • • , 6 fe _i, 0, 6 fe+ i, • • • ,b K ] T and W fe = diag([l — 6f, • • • ,l-b 2 k _ v 1,1-6; 



fc+i' ' 



6|-] T ). Prom ([28]) . it can be shown that 



-\ log |S fc | + i tr(W^£ fc ) + ^ tr(A T S T SA£ fc ). 
Let // fc denote the fc-th element of Solving dTg auss (ii k , T, k )/d/i k = yields 

/4 = e£(A T S T SA + c7 2 Wr 1 )- 1 A T S T (r - SAb fc ), (30) 



which is identical to z k in ([21 

One piece of information that the MMSE-based detector in [3] does not have is the covariance 
matrix of the posterior distribution, which can be shown to be 

1 

-2A T S T SA + W A 7 1 j . (31) 

In other words, the marginal posterior distribution of b k is Q(b k ) = N(fJ,' k , [S k ] k)k ). Since the 
prior distribution of b k is ignored during the detection operation, Q(b k ) obtained as such is in fact 
proportional to p(r\b k ). Therefore, 

t t r> , h , 1 a P( r \h = 1) , Q(b k = 1) 2// fc 

LLR w( 6 fe ) = log ^ - _ 1} « log ^^Zi) = [f-W ( 32 ) 



Applying the matrix inversion lemma on in (I31j) . we have 

S fc = W fe - W fe A(AW fc A + ^R-^^AW*. (33) 

Since [W k ] kik = 1, = 1 - ^|[(AW fc A + cr 2 R _1 ) _1 ] fc) fe = 1 - a k , where a fc is as defined in 

p6l) . Therefore, 

LLBw(&*) = -^f - = (34) 

[^k\k,k i- — Oik 

We have thus re-derived the Wang- Poor scheme via a radically different approach. It is remark- 
able how the variational inference viewpoint leads to exactly the same outcome as [3], while the 
conditional Gaussian assumption made about the MMSE filter output is no longer necessary. ■ 



17 



After taking APP decoding into account, the Wang-Poor turbo MUD algorithm as a whole can 
be seen as hybrid-Gaussian-SISO MUD. In the next section, we will systematically investigate all 
three possible scheduling schemes applied to Gaussian SISO MUD. 



4.4.2 The Standard Forms of Gaussian SISO MUD 

In Table [21 we summarize three different versions of standard Gaussian SISO MUD. In the following, 
we point out some of the major characteristics associated with each one, and, in particular, introduce 
the new flooding schedule implementation. 

Sequential- Gaussian-SISO: In Section ^. 4, 11 we presented an elegant variational- inference-based 
approach to obtain the EXT at the SISO detector output, which coincides with the EXT conven- 
tionally calculated through soft interference cancellation and MMSE filtering. 

In contrast to [3J, however, where the EXT's are stored until all users are processed and then 
used for APP decoding in parallel, the sequential schedule requires the EXT, LLR mua '(bk), be 
directly passed down to the APP decoder. Then the EXT from the APP decoder, viewed by the 
detector as the updated prior bk, is immediately used for the detection of bk+i- 

Flooding- Gaussian-SISO: The flooding schedule allows the APP decoding of all users to be done 
in parallel. In the detection stage, some changes to the derivation presented in Section 14.4.11 are 
needed, since the prior information of bk should not be ignored as is done in (|28[) . Instead, the 
postulated distributions in (j22j) are adopted. 

Given (|22p . the free energy becomes 

^ auss (/x,£) = ^[/x T (A T S r SA + ( 7 2 W- 1 ) M -2(r T SA + ( T 2 b r W- 1 ) / x] 
-\ log |E| + \ tr(W- 1 S) + ^ tr(A T S T SAS). 

Solving d^F(n, X)/<9/i. = and 9^ r (/x, S)/^!]" 1 = leads to the minimizer of fgaussin, X) in 
©: 

v ' v ' (36) 

S = (a^A^SA + W- 1 )- 1 . 

It implies that the approximate posterior distribution, p(b|r) ~ Q(h) = A/*(/i, X). In other 
words, the marginal posterior distribution of b^ is p(pk\r) ~ A/"(/ife, [E]fcfc). Recalling in (|22p . 
p(bk) — A/"(6fc, 1 — &|), if we apply the flooding schedule in Section I3T21 to extract the EXT, then 

< \U \ P( b k\ T ) Q(h) .,, 2 \ /ov\ 
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Table 1: Three scheduling schemes of turbo MUD employing Gaussian SISO MUD. 



Sequential- Gaussian- SISO 



Initialization: b = 
FOR j = 1 : J ( Outer Iteration) 
FOR k = 1 : K 



b fe = [bi, ■ ■ ■ , & fe _i, 0, • • • ,b K ] T 

W k = diag([l - bj, ■ ■ ■ , 1 - 1 1 - P k+1 , • • • , 1 - ^] T ) 
^ = A k el [A T W fc A + cr^R- 1 ] 1 [r-^ - Ab k 



a k = A 2 k [(A T W k A + cr 2 R" 1 )~ 1 ] 
LLR mud (b k ) = 

LLRd ec (6fc) -4= g LLR mu d(b k ) 
b k = tanh[LLR dec (6 fc )/2] 

END 

END 



k.k 



Flooding-Gaussian-SISO 



Initialization: b = 
FOR j = 1 : J ( Outer Iteration) 
FOR k = 1 : K 

bfc = [&i, • • • , 6fc_i, 0, bk+i, • • • , 6/f] T 

W = diag([l 

A fe = A k el [A T WA + ^R- 1 ] _1 [R-V - Ab fc 

a k = (l-^^WA + ^R- 1 )- 1 ]^ 
LLR„ md (6 fc ) = 

END 

FOR k = 1 : K 

LLRdec(b k ) <= g LLR mu d(6fe) 
b k = tanh[LLR dec (6 fe )/2] 

END 

END 



Hybrid-Gaussian-SISO 



Initialization: b = 
FOR j = 1 : J ( Outer Iteration) 
FOR jfe = 1 : if 



bfe = [&i, • • • , 6jb_i, 0, b k+1 , ■■■ , b K ] T 

W k = diag([l - bi , ■ ■ • , 1 - &I_ 13 1, 1 - ■ • • , 1 - ^] T ) 
^ = ^ fe eJ [A T W,A + <t 2 R _1 ] _1 [R-V - Ab fc 



O'A- 



,41 [(A^WfcA + ^R- 1 )- 1 ] 



LLR mud (6 fc ) = 

END 

FOR k = l:K 

LLRdec(b k ) <= g LLR m „d(6fe) 
6 fe = tanh[LLR dec (fe fe )/2] 

END 

END 



k.k- 
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where 

T 2 I Mfc 



fJ-ext — &ext 
1 1 



*lk* l-b'l 



(38) 



is true, because if J\f(fJ,i , af)J\f(fJ,2 , o\ ) cc Af(fi3, cr|), then [TT] 



£»3 _ W I 

1 1,1 

erf CT 2 CT 3 



(39) 



Finally, sampling N{n ex t-,o ex t) at = 1 and = —1, we obtain 
LLR w (6 fc ) = log ^tr 1 !) 



c x t 



a ext 

2^fc _ 2b fc 
[S] fe ,fc 1-62 

2el [b+ WA( AWA+ct 2 R~ 1 ) ~ 1 (R~ 1 y - Ab)] _ 2ft\ 

(l-6| ) _ (1 _ 6 2 ) 2 A 2 [(AWA+(J 2 R -l )])fejjt !_b2 (40) 

2enWA(AWA+<7 2 R- 1 )- 1 (R- 1 y-Ab fc )]-2b fe (l-62)yi2[(AWA+<7 2 R- 1 )] feifc 
(l-^.){l-(l-6 2 )A 2 [(AWA+<x 2 R-i)-i] fcifc } " 
2(l-^)A t e^(AWA+a 2 R- 1 )- 1 (R' 1 y-Ab fe ) 

(l-^Kl-im^KAWAVR-i)- 1 ],,} 



where 



a k = (l-^KAWA + ^R- 1 )- 1 ]^. 
In (|4ip . /ifc can also be computed more efficiently as 



(41) 



Afce^AWA + ^R-^-^R-V- Ab) + Mg [(AWA + ^R- 1 )- 1 ]^, (42) 



such that common information may be utilized to evaluate fik for all k. 

Hybrid-Gaussian-SISO: As mentioned earlier, the Wang-Poor turbo MUD scheme is exactly 
the hybrid-Gaussian-SISO MUD. It differs from the sequential schedule in that the EXT for bk 
generated by the SISO detector is now stored until the EXT's of all users k = 1, • ■ ■ , K are ready. 
Then EXT's are passed down to the APP decoders, for decoding in parallel. Hybrid-Gaussian-SISO 
MUD brings computational savings compared to the more optimal sequential-Gaussian-SISO MUD, 
due to both the possibility of parallel decoding, and the ease of evaluating [A r WjA + <t 2 R _1 ] _1 . 

So far, based on the Gaussian distributions assumed in the postulation step, we showed that 
the variational inference algorithm converges to a family of Gaussian SISO detectors, including 
the well-known Wang-Poor scheme as the special case. But the VFEM framework allows us to 
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generalize even further, since the Gaussian distributions, albeit convenient, are unnatural choices 
for BPSK symbols. The subsequent section will focus on a different family of detectors induced by 
a different set of assumptions in the postulation step. 

4.5 VFEM Interpretation of Discrete SISO Multiuser Detector 

Definition 2 A Discrete SISO Multiuser Detector is a multiuser detector that obtains soft 
estimates Q(b) through the VFEM routine, subject to the following postulated distributions: 

\ P(b) = nf=i 6^(1-60^, 6*e{±l} 

< p(rjb) = A/"(SAb,a 2 I) (43) 
1 Q(b) = nf=i7^(l-7*)^, b k £{±l}, 

where ^ and -y^ are the prior and posterior probability of b^ being 1. 

The discrete SISO MUD has two salient features in the postulated distributions: 1) Both the 
prior and posterior distributions are discrete, conforming to the actual properties of the data; 2) The 
posterior distributions of individual bits, {bk}^ = i, are assumed to be independent by applying the 
mean-field approximation. Indeed, the only distinction between this scheme and the jointly optimal 
detector is the mean-field approximation about the posterior, which, though a crude assumption 
in general, is asymptotically exact in the large system limit. This technique is closely tied to the 
replica method used to study the performance of randomly spread CDMA [T3]. The mean- field 
approximation is also used in [29] and [30] to derive multiuser detectors for uncoded CDMA. 

4.5.1 The Existing Form of Discrete SISO MUD 

In [2J, a simple (linear complexity) multiuser detector was proposed for coded CDMA producing 
near optimal performance at very high network load. Alexander, Grant and Reed applied a simple 
interference cancellation scheme and made the following observation: 

p{yk\bk,h\ k = b fc ) 

= v^ ex P j-2^-4 SA bfc-^) 2 } (44) 

= ^b ex p{-2^[^- s fc( r - SA bfc)] 2 } 

where b is the average bit estimate received from the APP decoder, and = [&i, • • • , 6fc-i, 0, bk+i, ■ ■ ■ 
Defining a^ ot = a 2 + o\ w as the variance of the combined channel noise and residual MAI modelled 
as Gaussian noise, o~ 2 ot can be approximated as the sample average of [s^(r — SAb)] 2 . 
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The soft estimate of b k can then be drawn from (|44p as a log-likelihood ratio: 

LLR(6 fc ) = -^AksKr - SAb fc ), (45) 

°tot 

and fed back to the APP channel code decoders. The decoders subsequently update b for the 
next iteration. Now we proceed to prove the link of this simple and effective scheme to the VFEM 
framework. 

Proposition 5 The SISO multiuser detection scheme described in J2|/ is an instance of the Discrete 
SISO MUD. 

Proof: Let the prior distribution p(h) in (|43|) represent the EXT provided by the APP decoder. 
Also, p(b) = Uk=iP(h), where p(b k ) = £ k 2 (1 -&)— implies that p(b k = 1) = (£ k )\l - £fc)° = 
£fc and p(b k = 0) = (^)°(1 — ^k) 1 = 1 — As seen from the derivation in (|4"4"]l . in the traditional 
MUD viewpoint, this information may be used for soft interference cancellation in the detection 
stage. We will now demonstrate that this IC technique corresponds to one iteration of recursive 
minimization of variational free energy. 

We let b k = 2£/% — 1 and m k = 2^ k — 1, to denote the prior mean and posterior mean of b k . 
After some mathematical manipulation, we have, according to (|43|) and (|10p . 



^ 2 8 1-6* 



+ flog a 2 



+ 2^ [ rTr ~ rTsAm + m r Bm + tr(A T S T SA)] , 
where B = A T S T SA — diag (A T S T SA). (pTOj) is obtained by utilizing the property that 

EO^Cb) = E(£ ¥i Cy&i&i + X£iCii&? 

= m T [C -diag(C)]m + l T diag(C)l, 



(46) 



(47) 



for b G {±1} K and C = [Cy G R AxA . 

Rearranging d^disci 111 ) / 9m = gives a system of equations, for k = 1, • • ■ , if, that determines 
the minimum of Tdisc{ m )i 

1 + m fc 1 + . 2 r T T n . , 

log- =log- r + — f? fc r-/3 fc m , (48) 

where ij k and are the fc-th column vectors of SA and B, respectively. The coordinate descent al- 
gorithm minimizes a function successively along one direction at a time. By setting dJ r disc(m) /dm k 
to zero in turn, we have the following update for user k in iteration i: 

LLR«(6 fe ) = LLR(°)(6 fc ) + 1 [rfir - /^m® - /^m^] . (49) 
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In (|49p . we defined the log-likelihood ratio LLR^(hfc) = log 1+r "m (or equivalently, ml*' 1 = 

1— mi 

tanh[LLR (i) (6 fe )/2]). The iterations are initialized with the prior probabilities of bk, i.e., = b 

K-k+l fc-i 

and LLR(°)(6fc) = log t 1 ^. As well, = [mi, • • • , m^-i, 0, • • -ToF, while m >/ i c = f0, • ■ • , 0, m*., • • 

1— Ofe 

The flooding schedule (see Fig. [3|) indicates that the EXT is the ratio between the posterior and 
the prior distributions, or the difference between posterior and prior LLR's, i.e., after / iterations, 
the multiuser detector passes the following EXT to the fc-th decoder: 

LLR w (6 fc ) = LLRpM - LLR^fe) = A - /3£mg - Pk™^} ■ ( 50 ) 
Consider simplifying (|50p by removing the serial iterations, then 



LLRwC&fc) = ^ [*7fc r " /3fc m(0) J = ja A ^k( T ~ SAbfc )' ( 51 ) 

which is similar to (|45p . Note that this simplified updating scheme does not guarantee the decrease 
of free energy, and thus is not as robust as the standard version in (|50p . ■ 

In the above proof, we have set a 2 to be the channel noise variance, and assumed it known. 
This is in contrast to Alexander-Grant-Reed's original derivation, where a 2 ot is the noise-plus-MAI 
variance which has to be estimated iteratively. We will postpone the discussion of this issue until 
Section 15.31 where we will show that, the iterative estimation of a 2 ot can be interpreted as the M 
step in the variational EM algorithm for joint data detection and noise variance estimation. 

Also from ([49 p . an interesting link to uncoded multi-stage SIC can be made - in that case, 
LLR(°)(& fc ) = 0. Defining bf = mf = tanh[LLR^(6 fe )/2], we get the hyperbolic-tangent SIC 
updates 

bf = tanh {±- 2 A k sl (r - SAb« - SAb^ 1 )) } . (52) 

In addition to demonstrating a solid theoretical foundation for the Alexander-Grant-Reed 
scheme, this section also clearly revealed the underlying suboptimal simplifications made en route 
to the final result. In the following, we will compare it to the standard discrete SISO MUD based 
on the theory of VFEM and the associated scheduling rules. 
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4.5.2 The Standard Forms of Discrete SISO MUD 

In Tabled we summarize three different versions of the standard discrete SISO MUD. The following 
highlights the major characteristics of each scheme. 

Sequential-Discrete-SISO : The sequential schedule obtains the EXT for bk through a serial 
update algorithm governed by ([50]) . Before the inner iterations, {LLR^ ec (6;)}^fc are set to the 
most recent output from the APP decoder, except for LLRrf ec (6fc), which is set to 0. This is 
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Table 2: Three scheduling schemes of turbo MUD employing discrete SISO MUD. 



Sequential-Discrete-SISO 



Initialization: m = and LLR t ; ec (6fc) = for all k 
FOR j = 1: J ( Outer Iteration) 
FOR k = 1 : K 

LLR dec {b k ) = 

FOR i = l:I (Inner Iteration) 
FOR I = k : A', 1 : k - 1 

LLR mud (bi) = ^ r\\v - /3^m 

LLR pos (bi) = LLRdec{bi) + LLR mu d(bi) 
mi = tanh[LLR pos (6;)/2] 

END 

END 



LLRdec(bk) <= g LLR muc ;(6fc) 



END 

END 



Flooding-Discrete-SISO 



Initialization: m = and LLR^ ec (6fc) = for all k 
FOR j = 1 : J ( Outer Iteration) 

FOR £ = 1 : J (Inner Iteration) 
FOR k = I : K 

LLRp OS (6 fe ) = LLR dec (fo fe ) + ^ r^r - /3^m 

TOfe = tanh[LLRp OS (6fe)/2] 

END 

END 

FOR k = 1 : K 

lAjR mu d(bk) = lAjRp OS (bk) — LLRrf ec (6fe) 



LLR dec (bk) De ^4 ing LLR, nud (bk) 



END 

END 



Hybrid-Discrete-SISO 



Initialization: m = and LLRd ec (bk) = for all k 
FOR j = 1 : J ( Outer Iteration) 
FOR fc = 1 : if 

LLR dec (6 fe ) = 

FOR i = 1 : I (Inner Iteration) 
FOR J = k : K, 1 : k - 1 

LLR mMd (6;) = ^ »7^r - /3^m 

LLR pos (6;) = !AjRdec(bi) + LLR mM d(6;) 
to/ = tanh[LLRp OS (6 ; )/2] 

END 

END 

END 

FOR k = 1 : K 



LLR dec (bk) De ^4 ing LLR, nud (bk) 



END 

END 
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equivalent to setting ^ = 1/2 in (|4*3"j) . as required by the sequential scheduling rule. 

After the serial update, LLR mu d(bk) is immediately sent to the fc-th APP decoder ({LLR muc i(bi)}i^k 
are discarded), such that an updated prior LLR^ ec (6fc) is generated (see Fig. [2]). The sequential 
schedule is inefficient, since a different serial update of LLR mu d(bk) needs to be done K times, one 
for each user. A SIC-based turbo MUD scheme proposed by Kobayashi, Boutros, and Caire [31] 
can be seen as a simplification to the full-blown sequential-discrete-SISO MUD, with 1 = 1 inner 
iteration. 

Flooding- Discrete- S ISO: The flooding schedule is much more efficient. The serial update algo- 
rithm in the inner iteration updates the posterior LLR's, {LLR pos (6fc)}^ 1 . After / iterations, in 
which the free energy is monotonically reduced, reliable estimates of {LLR pos (6fc)}^ 1 are attained. 
The SISO detector passes the EXT, LLR mu( i(bk), into the APP decoder, where the decoding of K 
users can be done in parallel. 

Hybrid-Discrete-SIS O: The hybrid schedule differs from the sequential schedule, in that when 
LLR mu d(6fc) is found, it is not immediately sent to the APP decoder to update LLRrf ec (6fc), but 
stored until all other users' EXT's are obtained, to facilitate parallel APP decoding. 



4.6 VFEM Interpretation of Decorrelating-Decision-Feedback SISO Multiuser 
Detector 



In [32], Duel-Hallen proposed the decorrelating-decision- feedback (DDF) multiuser detector. It has 
been shown to out-perform most of the linear and interference-cancellation detectors, especially 
in terms of near-far resistance. However, the soft-decision DDF MUD and its application within 
the turbo MUD framework is relatively unknown. In this section, we will propose a SISO DDF 
multiuser detector using the VFEM principle. The subsequent discussion will allow new insights 
and new algorithms, including an interesting link to the discrete SISO detector discussed earlier. 
Consider applying the VFEM routine to the following postulated distributions: 



p(b) 

p(y|t>) 

Q(b) 



nf=i£ fc 2 (i 

A/"(FAb, <t 2 I) 

nf=i7 fc 2 a 



b k € {±1} 



b k G {±1}, 



(53) 



Notice that these distributions are identical to the discrete SISO case in (03]), except that the 
received vector r is replaced by its sufficient statistics y (defined in ©). Therefore, we may 
directly make use of the derivation in Section 14.51 and arrive at an iterative detector similar to 



log 



1 + m k 
1 - m k 



log ~- + — 



(7- 



(54) 
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where t] k and f3 k are the fc-th column vector of FA and A r F r FA — diag (A T F r FA), respectively. 
([54"P is in fact identical to (@Sj), since F T y = S r r = y and F T F = S r S = R. 

Consider the uncoded scenario, i.e., b k = for k = 1, • • ■ , X, then (|54p reduces to 



(55) 



The free energy is monotonically reduced if m k is evaluated in a SIC fashion similar to (|49p . i.e., 
in the i-th iteration: 



_ + ™v. J 1 /^T,- r S T m (i) S T_(i-l) 

>k 



tanh i ^2 (^fc y - m <l - Pk m 



(56) 



Now we take a crucial step that will produce the DDF SISO detector based on (|56j) . We will 
alter the definition of f) k and (3 k , by replacing F with a new matrix F k . Let F k be F, except with 
elements F k+ i k to Fx, k nulled, i.e., 



F fc 



-^2,1 



-ffc.fc 







-f/i",fe+l 



(57) 



Then we let T7 fc and f3 k be the fe-th column vectors of F^A and A T F^FfcA — diag (A T F^FfeA), 
respectively. Subsequently, we see that 



i) k = [(),■■• ,0,A k F k , k ,0,--- ,oy 
fik = AkF k k[AiFk i, ■ ■ • , Ak-iFk k-i, 0, • • • , 0] T , 



(58) 
(59) 



and 



Hence ([56l) becomes 



-T- 

my 

Pk™>k 



A k F kjk y k 
0. 



m fc = tanh <j — (A k F k)k y k - /3^m <fc 



(60) 



(61) 



Finally, it is not difficult to recognize that the argument inside the tanh(-) function is the DDF 
detector output, scaled by 1/a 2 . The additional tanh function has its intuitive appeal, since 
it provides soft bit estimates for BPSK in an AWGN channel (assuming perfect cancellation of 
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interference) . 

We have therefore derived a soft-decision DDF detector, obtaining it based on the discrete SISO 
MUD, and by replacing the matrix F with F k in the free energy minimization stage. If informative 
priors, {LLR^ ' (b k )} k=1 , are used, (f6T]) simply becomes 

LLR pos (6 fc ) = LLR(°)(6 fc ) + [A k F Kk y k - j3£m< fc ) , (62) 

where rrtk = tanh[LLRp OS (6fc)/2]. But the EXT is unchanged, since 

2 / \ 
LLR w (6 fc ) = LLR pos (6 fe ) - LLR(°)(& fe ) = [A k F k>k y k - p£m <k ) . (63) 

Definition 3 A DDF SISO Multiuser Detector is a multiuser detector that obtains soft esti- 
mates Q(h) through the VFEM routine, subject to the following postulated distributions: 

[ P(b) = nf=i^(l-&)^, he{±l} 

< p (y|b) = AA(FAb,cr 2 I) (64) 

l + b k l _ b 

Q(b) = nf=i7 fc 2 (1-7^) — , & fc G{±l}, 

anrf replacing F mi/i f'as m J57|) ^) in the free energy minimization stage. 

To understand the effect of transforming F to F^ in simple terms, we first need to realize that 
f\ k acts as a matched filter on y to extract the decision metric for b k , while j3 k indicates the amount 
of interference to be subtracted from fjTy to improve estimation. By defining F^ as in (|57p . we 
heuristically assume fj k to be non-zero only in the A:th element, essentially ignoring the presence 
of b k in yfc+i, • • • ,y~K- This simplification also makes j3 k non-zero only in the first k — 1 elements, 
implying that only the estimates for b±, ■ ■ ■ , b k _\ need to be subtracted for the detection of b k . 

In this section, we have shown that the same free energy expression specifies the DDF SISO 
MUD and discrete SISO MUD. But in DDF SISO MUD, we replaced F with F fc in the execution 
of the coordinate descent algorithm to enable a decision-feedback structure. 

4.7 DDF-Aided Discrete SISO MUD 

Having discussed both Gaussian SISO MUD and discrete SISO MUD, a natural question to ask is 
how the two compare with each other in complexity and performance. In general, it is well-known 
that Gaussian SISO MUD is a robust algorithm, but has relatively high complexity. This is easy 
to see from the VFEM viewpoint, since Gaussian SISO MUD minimizes J- ga uss(n, 5j) exactly in 
the free energy minimization stage. But solving the optimization problem exactly entails higher 
complexity due to the need for matrix inversion. 

Discrete SISO MUD, on the other hand, decreases J-di S c{ m ) iteratively through a SIC-like pro- 
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cedure, which has only linear complexity in K. However, since J r rfj sc (m) is not a convex function, 
serial updates of this form can become trapped in local minima, resulting in poor detection perfor- 
mance. This is the reason why detection schemes based on discrete SISO MUD only work in systems 
with small spreading code correlations, such as random spreading CDMA systems [21 [31], but not 
in high-interference channels, such as those considered in [3] with the spreading code correlation 
between different users being p = 0.7. 

As is shown in Section |4. 6\ the DDF SISO MUD has the same target free energy as the discrete 
SISO MUD. But through a small modification to the coordinate descent step, the DDF SISO MUD 
is able to overcome local minima and bring the free energy close to it's minima. The only problem 
is that, due to the substitution of for F, it cannot refine its estimate iteratively, and hence does 
not settle at the exact global minimum of ^j SC (m). 

To combine the good convergence property of the DDF SISO MUD and the optimality promised 
by the discrete SISO MUD, we will introduce a so-called DDF-aided discrete SISO detector as fol- 
lows 

Definition 4 A DDF-Aided Discrete SISO Multiuser Detector is a multiuser detector that 
obtains soft estimates Q(b) through the VFEM routine, subject to the postulated distributions 
in (64\ ). It is implemented by replacing the first iteration of the discrete SISO MUD algorithm 

with DDF SISO MUD. 

This detector utilizes the good initialization of DDF SISO MUD, and implements discrete 
SISO MUD in subsequent iterations to drive the free energy even lower, and produce improved 
performance. We will demonstrate how the DDF-aided discrete SISO MUD improves upon both 
the original discrete SISO MUD and DDF SISO MUD through a simple simulation based on 
Example 1 in [32]. 

Consider an uncoded two-user synchronous CDMA system with spreading-sequence cross-correlation 
p = 0.7. We let the signal-to-noise ratio (SNR) of user 1 (strong user) be no smaller than that 
of user 2 (weak user), i.e., SNR(l) > SNR(2). Fixing SNR(2) at 11 dB and varying SNR(l), we 
obtain the bit error rate (BER) of user 2 as shown in Fig. [U In Fig. 2{a), we detect the strong 
user first, the DDF-aided discrete SISO MUD is implemented with the DDF SISO MUD followed 
by four discrete SISO MUD iterations. It is seen that in this case, the additional discrete SISO 
MUD iterations do not offer performance enhancement, thus the DDF-aided discrete SISO MUD 
performs nearly identical to DDF SISO MUD. The use of discrete SISO MUD alone suffers from 
poor convergence as predicted, and is omitted in the plot. 

However, like Duel-Hallen's conventional DDF detector, DDF SISO detector is sensitive to the 
detection order. This means if the weak user is detected first, the near- far resistance property no 
longer holds. Such an effect is depicted in Fig. \Mjo), where we detect the weak user first. By fixing 



28 



SNR(2) at 11 dB and varying SNR(l), the BER of different schemes are plotted. It is seen that the 
DDF SISO MUD no longer approaches near-optimal performance as the SNR difference increases, 
while the DDF-aided discrete SISO MUD continues to demonstrate good near-far resistance even 
with non-optimal detection order. This is because the additional discrete SISO MUD iterations 
rectifies the performance degradation of DDF SISO MUD due to unfavourable detection order. 

These simple examples reveal that the DDF-aided discrete SISO MUD is a much more powerful 
detection scheme compared to both discrete SISO MUD and DDF SISO MUD. It can be viewed 
as either the multiple- iteration extension to DDF SISO MUD, or as a discrete SISO MUD with 
convergence acceleration. The DDF-aided discrete SISO MUD is a powerful algorithm that is now 
capable of coping with strong- interference channels such as the ones assumed in [3], which will be 
studied in Section 

4.8 Summary 

Table 3: Variational-inference-based multiuser detectors employing different scheduling schemes. 





Gaussian SISO 


Discrete SISO 


Sequential 




M 


Flooding 




m 


Hybrid 


® 





Table [3] categorizes some of the existing turbo multiuser detectors according to the standard- 
form SISO MUD schemes outlined in Table [2] and [TJ * indicates the schemes that are outcomes of 
the general framework, but not seen in the literature. We now outline how the existing schemes fit 
into the categories created. 

• [3] is identical to the hybrid-Gaussian -SISO MUD, but it is re-derived in Section [4.4.11 via 
a completely different VFEM-based approach. With the help of the insights offered by the 
VFEM framework, we are able to further extend [3] to sequential and flooding schedule 
implementations, both explained in Section [4.4.21 

• |31] can be seen as the standard sequential-discrete-SISO MUD with sequential scheduling, 
and 1 = 1 inner iteration. Moreover, [31] considers the joint estimation of noise-plus-MAI 
variance and channel amplitude. Like the noise-plus-MAI variance estimation in [2], this can 
also be studied within the VFEM framework, as an instance of the variational EM algorithm 
discussed in Section 

• [2] can be seen as a simplified version of flooding-discrete-SISO MUD, but it differs from the 
the standard approach in two aspects: 1) [2] uses parallel updates of each user's bit LLR, 
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which does not guarantee the reduction of free energy. 2) [2] uses the posterior estimate 
(instead of the extrinsic information) from the APP decoder as the initialization of m, a 
practice that is suboptimal from the message-passing standpoint. 

5 Variational EM for Iterative Parameter Estimation 

In recent years, the impact of imperfect channel estimation on uncoded and coded multiuser detec- 
tion have been studied in the large system limit [33, 34, 35J. However, the alleviation of this problem 
has rarely been systematically investigated in the literature. In this section, we will introduce an 
important extension of variational inference, called variational EM, to enable joint parameter es- 
timation in turbo MUD. The two examples in Sections 15.21 and 15.31 based on the Gaussian SISO 
detector and discrete SISO detector, respectively, will illustrate how the variational EM framework 
provides a feasible solution to practical turbo receiver design when exact channel state information 
(CSI) is unavailable. 

5.1 Formulation of Variational EM Algorithm 

Like most detectors, variational- inference-based detection schemes assume perfect knowledge of 
system parameters, such as various types of channel information. These parameters, in practice 
however, may not be known accurately at the receiver. One traditional way to incorporate the 
uncertainties of these parameters in the detection operation is through the EM algorithm |36^ I37j. 
The EM algorithm is used to estimate a vector of parameters, say 9, from the observation r that is 
termed "incomplete data" , together with some auxiliary or hidden variable, say b. The algorithm 
iteratively carries out two operations: the E step and the M step. The j-th iteration effectively 
computes a probability density p(b|r, O^ -1 ') in the E step, where 6^~^ is the estimate of 6 in the 
previous iteration, and then in the M step maximizes 



over 6, yielding 6^'. 

The work in [15] shows that the EM algorithm is equivalent to jointly estimating the hidden 
variables and parameters by minimizing a single free-energy expression over a postulated distribu- 
tion for the hidden variables, and over the parameters. The VFEM formulation offers an additional 
degree of freedom to the conventional EM algorithm, such that in the E step, an approximate 
posterior Q(b|0^ -1 ^) may be used to replace the exact posterior p(b|r, 0^~^). Variational EM 
has been successfully applied in various applications, e.g., in image processing to perform scene 
analysis [38], and in joint detection/estimation problems in wireless channels [391 140] . 




(65) 
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To provide a concrete example, assume remains static over T independent uses of the channel. 
In the context of MUD, this implies that we assume 0, the noise variance a 2 for example, remains 
constant when a block of T bits are transmitted by each user (T could be the code word length). 
The variational EM algorithm extracts point estimates for and postulated posterior distributions 
over the channel bits. Therefore, the new Q-function may be written as: 

T 

Q(bi,--- ,b T ,0) = 5(0-d)l[Q(b t ), (66) 

t=i 

where is an estimate of 0, and ht contains the channel bits transmitted in the t-th use of 
the channel. The notation 5(a — a) denotes a vector Dirac delta function with the following 
properties: J <5(a — a)/(a) da = /(a), and f 5(a — a) da = 1. Recall that for i.i.d. data, p(h,6,r) = 
p(&) X\t=i P(bt) \0). Substituting ([66]) into (fTQ|) yields the following free energy: 

= [ [s(o-o)fl QOh) log ^-t )nLl9(bt) ^ 

= / 5(0 - 0) log 5(0 - 0)d0 - f 5(0- 0) \ogp(0)dO + f f[ Q(h t ) log Jjh^^. db 
Je Je J *>t=i Xlt=iPV°t,n\0) 

= -\o gP (0) I Q(b t )log -g^*) dbtV 
t^yJbt p(b t ,r t \0) J 

(67) 

In the last line of the above equation we omit the constant term J 5(0 — 0) log 5(0 — 0)d0. The 
term p(0) constitutes the prior knowledge of the parameter. In cases when it is not available, we 
may set p(0) = constant and ignore it in the minimization of free energy. 

As proven in [15], alternating between minimizing (|67|) w.r.t. {Q(h t )}f =l in the E step, and 
w.r.t. in the M Step leads to the exact EM algorithm where {ht}f =1 are the "hidden variables" 
and is the unknown parameter of interest. Unfortunately, the exact EM is only possible in special 
cases, because the computation in the E step of Q(h t ) = p(h t \r t , 0) (s.t. f bt Q(h t )dh t = 1) is often 
intractable. But suppose we use a postulated (and simple) distribution Q(h t ), with parameter At, 
and then find At that minimizes (|67p . We then arrive at the variational EM algorithm, which 
consists of the initialization plus the E step and M step in the j-th iteration: 
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Initialization Choose initial values for 0^°\ 

E Step Minimize ^(Ai, • • • , A T , 9 ) in §7$) w.r.t. X t 






= argmm / Q(b t )log db t , 
At ^ p(b*,r t |0 U j ) 


(68) 


for t = l,--- ,T. 






M Step Minimize ^(A^, ■ 


•• ,A^' } ,0) in ([67D w.r.t. 




e ij) = argmin V ( [ Q^(b t )log <9 ° ) ^ dbJ logp(0). 
tbi V-'bt P(bt,r t [0) / 


(69) 



In the rest of the section, we will implement the variational EM algorithm in both Gaussian 
SISO MUD and discrete SISO MUD, to resolve the uncertainty in channel information at the 
receiver. More specifically, we will assume no noise variance information and noisy channel ampli- 
tude information at the receiver, and attempt to adaptively estimate the noise variance, as well as 
improve the channel amplitude estimation, jointly with data detection. 



5.2 Channel and Noise Variance Estimation for Gaussian SISO MUD 

Adding a time index t to (pQ) to represent a sequence of channel observations rj = SAb^ + n^ 
(t = 1, ■ ■ ■ ,T), according to (|67p . we may write the free energy for T channel realizations as 

T 

Fgauss(P>l, ■ ■ ■ ,/^T> S ir- - » S T, O" 2 , a) = - logger 2 ) - logp(a) + ^ FgaussiVn S t I " 2 * a )- ( 70 ) 

t=x 

where a = diag(A), and we define 

^ anss (/x i ,S i |^ 2 ,a)= / Q(b f )log , u Q{ht \ r dbt, (71) 
Jb t P(bt,r t |cH,a) 

which is equal to (|29p . except that a 2 and a are now explicitly shown to be variables of T . Here 
6 = {<r 2 ,a} are the model parameters to be estimated. In (|70p . p(<r 2 ) is a constant, since we 
do not assume prior knowledge about a 2 . But estimates of the channel, however noisy, can be 
assumed available at the receiver. In particular, we model the true channel, a, as the sum of 
the channel estimate, a, and random measurement error with variance ? 2 [5T]. Or, equivalently, 
p(sl) = M(sl, ? 2 I), where ? is assumed to be known. 

The E step, i.e., the estimation of /x and X, has already been completed in Section 14.41 The 
only challenge now remaining is the M step. From (|69p . we see that we are required to solve for 

{<r 2 ,a} = argmin FgaussiVi, ■■■ ,Mti ' • • , X T ,cr 2 ,a). (72) 

cr ,a 
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To this end, the following identity is needed: 
Lemma 1 

tr[diag(x) • A • diag(y) • B] = x T (A o B r )y (73) 

for square matrices A, Be M> NxN , and vectors x, y € M Arxl . 

Proof: Writing A = [Aij] and B = [B^], it is easily verified that both sides of the equation 
are equal to Y^ij x i^ij B jiVj- ■ 
Utilizing Lemma Q] and ignoring the terms independent of a 2 and a, we have 

,/^T>Si,- ' ' >S T ,<7 ,a) 

= EL { 2^(rt - SM t a) T (r t - SMa) + f log(a 2 ) + ^ T [(S T S) o E t ]a + ^(a - a) r (a - a)} , 

(74) 

where Mt — diag(/x^). Equating dJ-/d& = produces 

&=|^A<fS r SM + (S T S)oE t + ^l| ^A4 t S r r t + ^aJ. (75) 
Substituting a = a into J 7 and solving for • • • , Si, ■ ■ ■ , Sy, a 2 , k)/da~ 2 = gives 

° 2 = jjf {^J i (r * " SA ^ )T(r * " SA/ ^ + aT [(S T S) o S t ]a] | , (76) 

where A = diag(a). Note that (|75[) and (|76p decrease the free energy in a coordinate descent 
manner, not necessarily minimizing it, due to the coupling of a and a 2 in the two equations. But 
this is acceptable, since they will converge to the exact minimizers over the EM iterations. 

After incorporating iterative decoding, the variational EM algorithm for turbo MUD employing 
flooding-Gaussian-SISO MUD is described in Tabled! 

This is an extension to the flooding-Gaussian-SISO MUD algorithm in Table [TJ The variational 
EM extension to sequential or hybrid Gaussian-SISO MUD is straightforward, where the M step 
may be implemented either once every outer iteration j, or more frequently, after the E step update 
of each user. 

5.3 Channel and Noise Variance Estimation for Discrete SISO MUD 

Similar to Gaussian SISO MUD, the free energy of discrete SISO MUD for T channel outputs can 
be written as 

T 

■7\f; sc (mi, • • ■ ,m T ,cj 2 ,a) = -logp(cr 2 ) - logp(a) + ^^ sc (m t |cr 2 ,a), (77) 

t=i 
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Table 4: Variational EM algorithm employing Gaussian SISO MUD. 



Initialization 



Update for Q(h) 
E Step 



BCJR Decoding 



Update for a 2 and a 



M Step 



Set b t = 0, ct 2( °) = 0, and a (0 ) = a. 

FOR j = 1 : J (Outer Iteration) 

FOR k = 1 : K and t = 1 : T 

b t ,fe = [bt,i, ■ ■ ■ ,b tt k-i,0,b t ,k+i 
W t = diagQl ,1 

h,k = A fc e^ [A^A + ^R- 1 ]" 1 [R-Vt-Abt,* 

a t , k = (l-bl k )Al[(A T W t A + a 2 K-i)-i} kk 

END 



FOR k =1 : K and £ = 1 : T 

Decoding 



(Extrinsic Information) 



LLRdec(bt,k) <= LLR mM( i(& ti fe) 
6t, fe = tanh[LLR dec (6 t;fc )/2] 

6 tifc = tanh[LLR mud (6 tife )/2 + LLR dec (6 tife )/2] (Posterior Estimate) 
fh,k *~ h,k 

[St]*,* <- i - 



END 



a 

a 2 
END 



{ELi MfS T SMf + (S T S) o S t + (ELi M t S T r t + £a) 

Ft {ELi [(r* ~ SA M /(r t - SA/x t ) + a T [(S T S) o S t ]a] } 



Fdisc( m t |°" 2 5 a ) is simply (I46j) with an additional time index t. In (|77p . we set p(u 2 ) to a constant 
and let p(a) = M(a, ? 2 I). Making use of the E step already derived in Section [431 we only need to 
complete the M step. 

Utilizing Lemma [T] and ignoring the terms independent of a 2 and a, we have 

• • • ,m T ,cr 2 ,a) 

= EL {f log(a 2 ) + ^aWS^SM, - diag(MfS r SM,)]a+ ^a r diag(S r S)a - ^ifSM t a} 
+ 2^(a-a) T (a-a), 

(78) 

where Mf = diag(mt). Equating dT /da = produces 

a = {ELi diag(S r S) + Mj[S T S - diag(S T S)]M t p (eL M t S T r t + ? 
Substituting a = a into J 7 and solve for f(mi, • • • , hit, c 2 , a) /da~ 2 = gives 

ATT I ^ 



(79) 



A' 



(r t - SAm/^ - SAm t ) + £(1 - m 2 fc )i 2 • s£s 



fe=i 



(80) 



The variational EM algorithm for flooding-discrete-SISO MUD is presented in Table 

Now consider taking a step backward and assume a to be known perfectly, and only a 2 needs to 
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Table 5: Variational EM algorithm employing Discrete SISO MUD. 



Set m t = 0, LLR dec (b t , k = 0), ct 2(0) = 0, and = a. 

FOR j = 1 : J (Outer Iteration) 

FOR i = 1 : I (Inner Iteration) 

FOR k = 1 : K and t = 1 : T 

LLR pos (6 t: fe) = LLR £ ; ec (6 tj fe) 

m t .k = tanh[LLR pos (& t . fe )/2] 

END 

END 

FOR k = 1 : K and t = 1 : T 

LLR TOM( ;(6t i fe) = LLRp OS (b tt k 

Decoding 



Initialization 



Update for Q(h) 
E Step 



BCJR Decoding 



Update for a 2 and a 
M Step 



T 



f3 T k xn t 



LLRdec(bt,k) 

LiLRdec(bt,k) ^== & LLR mU( i(&t.fc) (Extrinsic Information) 

b t ,k = tanh[LLR mud (6 t , fe )/2 + LLR dec (6i ifc )/2] (Posterior Estimate) 
m t k <- bt,k 



END 



a 

a 2 
END 



{eLi sTs + Mf[S T S - diag(S T S)]M t + f^l} ' M t S T r t + £ 

ik {ELi [(r* - SAm t ) T (r t - SAm t ) - Ef =1 (l - < fc )A 2 , • s^s fc ] } 



be estimated. It is clear that, in (|80p . each element of mj converges to —1 or +1 as the algorithm 
converges. Hence, the last term in (|80p eventually vanishes. By omitting the vanishing term, this 
is exactly the equation to estimate of o4 = a 2 + o~ 2 MU in [2]. Together with Section Ei~5| we have 
now completed the interpretation of Alexander-Grant-Reed's turbo detector as an instance of the 
variational EM algorithm. 



6 Simulation Results 

In this section, we investigate the performance of turbo multiuser detectors employing the Gaussian 
SISO MUD and DDF-aided discrete SISO MUD (the original-form discrete SISO MUD would 
suffer from poor convergence due to the non-convexity of ^j sc (m)). We will consider two different 
scenarios to test the proposed schemes in both standard benchmark settings and in more practical 
situations: 

Scenario I: A flat-fading synchronous CDMA channel same as that in [3]: A four-user system is 
assumed with equal cross-correlation p = 0.7. All users have equal power and employ the 
same rate 1/2 convolutional code with generators 10011 and 11101. 

Scenario II: A flat-fading synchronous CDMA channel with random spreading: The system has 
spreading gain of N = 32 and K = 32 active users. All users have equal power and employ the 
same rate 1/2 convolutional code with generators 111 and 101. In this case, we also assume 
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the receiver has noisy channel information (unknown a 2 and inaccurate channel amplitude 
estimates) . 

Since the focus of this paper is the introduction of a theoretical framework, these test results 
are for proof-of-concept purposes only and are by no means complete. More elaborate and com- 
plete simulations, such as the near-far situation or the extensions to multipath channels, may be 
performed following the same recipe presented in [3] and will be omitted. 

6.1 Gaussian SISO MUD 

6.1.1 Scenario I with Perfect Channel Knowledge 

In Section 14.4.21 we proved that the Wang-Poor turbo MUD scheme is an instance of hybrid- 
Gaussian-SISO MUD, whose performance is depicted in Fig. 3 of [3j. In this paper, we will real- 
ize the other two variations, namely sequential-Gaussian-SISO MUD and flooding-Gaussian-SISO 
MUD, predicted by the theory of VFEM and the associated message passing rules. The complete 
Gaussian SISO MUD algorithms are described in Table [TJ 

Fig. 0a) and Fig. 0b) plot the BER performance of turbo MUD employing sequential-Gaussian- 
SISO MUD and flooding-Gaussian-SISO MUD, respectively, in simulation scenario I. The results 
after each of the J = 5 outer iterations are recorded. It is shown that both schemes out-perform 
hybrid-Gaussian-SISO MUD, which was originally proposed outside of the variational inference 
framework. The BER improvement, although small, verifies that the sequential and flooding 
scheduling schemes are sound in practice as they are in theory. 

In the above simulations, we find that the difference in performance between sequential schedule 
and flooding schedule is small. For conciseness, in the case of inaccurate channel estimates, we will 
only consider the flooding schedule, since it in general leads to lower overall complexity and latency 
at both the detection and decoding stages. That been said, one may choose to implement the 
sequential or hybrid schedule with special need arises. 

6.1.2 Scenario II with Unknown Noise Variance and Inaccurate Channel Estimates 

We now consider simulation scenario II, a more challenging situation where the receiver is assumed 
to have no noise variance information and only inaccurate channel estimates. The actual channel is 
assumed to be Gaussian-distributed conditioned on the inaccurate estimate a, i.e., p(&) = M(sl, ? 2 I), 
as in Section [5.2i In the simulations, we fix the true channel = 1 (or a = 1), and generate the 
noisy channel estimate as 1 + <5fc, where p{5k) = A/"(0,<j 2 ). 

Fig. [6] depicts the flooding-Gaussian-SISO MUD implemented with joint a and a 2 estimation. 
We set £ to be 0.1, 0.3 and 0.4, respectively, to be compared with the case of exact channel knowledge 
at the receiver. To be consistent, the curves plotted are the results after J = 10 outer iterations, 
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Figure 5: BER performance of turbo MUD employing Gaussian-SISO MUD (K = 4, p = 0.7). (a) 
Sequential schedule; (b) Flooding schedule. 
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Figure 6: BER performance of turbo MUD employing flooding-Gaussian-SISO MUD with joint 
noise variance and channel estimation (N = 32, K = 32). The single user bound is obtained by 
assuming perfect channel knowledge. 
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although in most cases convergence is achieved with fewer iterations. It is seen that, with the help 
of the variational EM algorithm, this turbo multiuser detector is very robust to severe channel 
estimation error, up to q = 0.3. It is only when reaches 0.4, i.e., 40% that of the actual channel 
amplitude, significant performance degradation starts to appear. 

6.2 DDF-Aided Discrete SISO MUD 

6.2.1 Scenario I with Unknown Noise Variance Only 

To implement DDF-aided discrete SISO MUD in turbo MUD, we simply need to replace the first 
outer iteration (j = 1) in the algorithms of Table [2] with DDF update (|63p . and keep the remaining 
outer iterations (j > 2) the same. We will first simulate scenario I with turbo MUD employing DDF- 
aided sequential-discrete-SISO MUD and DDF-aided flooding-discrete-SISO MUD, each having 
1 = 6 inner iterations within every outer iteration, except for the first outer iteration, where the 
DDF update only requires 1 inner iteration. In the detection algorithms prescribed in Table [21 we 
added a noise variance estimate step like it is done in [21 [31]. This is a special case of the variational 
EM algorithm introduced in Section 15.31 with a 2 alone being the unknown parameter. Having the 
noise variance as an unknown does not seem to temper the detector performance compared to 
perfectly-known noise variance, and, in certain cases, even helps. We attribute this phenomenon 
to its possible relation with optimizing (|46p via simulation annealing by setting a 2 to be the 
temperature parameter [12]) but will leave detailed discussions to future works. 

Fig. 0a) and Fig. EJb) depict the BER performance of the above-mentioned schemes over 
J = 5 or 6 outer iteration. Despite the slightly inferior performance compared to the Gaussian 
SISO counterparts, the DDF-aided discrete SISO detectors have been shown to produce excellent 
results even with unknown noise variance. The existing discrete SISO detectors, such as [2| 131] . 
would fail under such simulation settings, due to the lack of DDF initialization. 

6.2.2 Scenario II with Unknown Noise Variance and Inaccurate Channel Estimates 

We now further investigate the case of inaccurate channel estimates in addition to unknown noise 
variance with simulation scenario II. Like the Gaussian SISO case, we set the true channel to be 
Ak = 1, and generate noisy channel estimates by letting be 1 + 5k, where p(5k) = A/"(0, <j 2 ). 

Fig. [S] depicts the performance of DDF-aided flooding-discrete-SISO MUD under channel es- 
timation error of <^ = 0.1, 0.3 and 0.4, respectively. It is shown that, by iteratively refining the 
channel estimates, the modified flooding-discrete-SISO MUD is also robust to significant errors in 
channel estimation. 
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Figure 7: BER performance of turbo MUD employing discrete-SISO MUD with joint noise variance 
estimation (K = 4, p = 0.7). (a) Sequential schedule; (b) Flooding schedule. 
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Figure 8: BER performance of turbo MUD employing flooding-discrete-SISO MUD with joint noise 
variance and channel estimation (N = 32, K = 32). The single user bound is obtained by assuming 
perfect channel knowledge. 
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7 Conclusions 



The concept of free energy is a far-reaching one. Besides its original formulation in statistical 
physics, it also recently finds its application in interpreting various probabilistic inference tech- 
niques, such as the EM algorithm [15] and belief propagation (sum-product algorithm) [43| , 

The main focus of this paper is the introduction of a comprehensive theory, centered around 
the minimization of variational free energy, that would describe various SISO detectors in multiple 
access channels. In particular, we developed guidelines for SISO detector design in linear Gaussian 
vector channels, first by pointing out the importance of message-passing scheduling, and next by 
deriving detection algorithms accordingly. We show that it is a carefully-chosen scheduling scheme 
combined with its accompanying SISO detector that produces a good turbo detector, opposed to 
the conventional view that focuses on the detector design alone. With this new paradigm comes 
a spectrum of plausible SISO detectors. In addition to new detectors constructed as a result, we 
show that some of the classic SISO detectors can be seen as special instances of this composition, 
and subsequently refined systematically. 

In the algorithm design stage, after the postulation of prior and posterior distributions with 
the help of some intuition, it may be seen that our efforts in obtaining good algorithms have 
been condensed to the evaluation and optimization of free energy expressions, such as the ones 
at the centre of this paper, Tgaussi^-, S) and JF rf ,j sc (m). By viewing existing MUD schemes under 
the same roof, we obtain many interesting insights. Furthermore, we extended the initiative to 
variational-EM-based MUD, in which channel parameters may be efficiently and blindly estimated 
in conjunction with turbo MUD. 
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